Optimized free energies from bidirectional single-molecule force spectroscopy 
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An optimized method for estimating path-ensemble averages using data from processes driven in 
opposite directions is presented. Based on this estimator, bidirectional expressions for reconstructing 
free energies and potentials of mean force from single-molecule force spectroscopy — valid for biasing 
potentials of arbitrary stiffness — are developed. Numerical simulations on a model potential indicate 
that these methods perform better than unidirectional strategies. 



Crooks' path-ensemble average theorem (Eq. []} en- 
compasses a set of exact results in nonequilibrium sta- 
tistical mechanics pertinent to systems driven from ther- 
mal equilibrium by a time-dependent external potential 
[l[ . These include Jarzynski's equality Q and the Crooks 
fluctuation theorem [3|, which relate equilibrium free en- 
ergy differences to the nonequilibrium work distribution, 
as well as reweighting relations that allow one to recover 
arbitrary equilibrium ensemble averages from measure- 
ments of driven nonequilibrium processes [lj. Because 
of the intimate connection between such processes and 
molecular force spectroscopy, these theorems have been 
widely invoked to extract free energies and potentials of 
mean force (PMFs) from single-molecule pulling experi- 
ments @, 1, 1, 0, 1 . 

While formally correct, the practical utility of these 
relations is limited by the presence of exponential aver- 
ages of the work, which are dominated by rare events 
and therefore have notoriously slow convergence proper- 
ties Q. In order to improve their convergence, strate gies 
such as work- weighted trajectory sampling 13, II , 12, H} 
have been proposed. Here we suggest another method 
to accelerate the convergence of these averages: includ- 
ing trajectories from the reverse process in the forward 
path-ensemble. This is motivated in part by the observa- 
tion that the exponential average of the work in the for- 
ward process is dominated by those rare trajectories that 
resemble time-reversed counterparts ( "conjugate twins" ) 
of typical trajectories generated by the reverse protocol 
Thus, our goals are to construct optimized forward 
path-ensemble average estimators that explicitly include 
such trajectories, and apply them to the problem of es- 
timating free energies and potentials of mean force from 
single-molecule pulling experiments. 

The starting point of our analysis is Crooks' path- 
ensemble average theorem, which relates the forward av- 
erage of an arbitrary functional T = J-[T] of the phase 
space trajectory T = {q(t),p(t)} to its work- weighted av- 
erage in the reverse process, namely [lj 



jr e -l3(W+AF)\ 

I R. ' 



(1) 



In the above, the forward average (...} F is an average 
over all trajectories (path-ensemble average) generated 
in the forward process, wherein an external parameter 



(e.g. the position of a harmonic trap in a single-molecule 
pulling experiment) is driven from the value A to B in 
r units of time after equilibration at A, while (...)# is a 
similarly defined average in the reverse direction, from 
B to A. The total work W[T] accumulated up to the 
final time r is defined in terms of the time-dependent 
Hamiltonian H = H(q(t),p(t);t) as W = J^(dH/dt)dt, 
while AF = Fb — Fa is the free energy difference between 
the equilibrium states corresponding to the endpoints A 
and B. Finally, the notation T = T\t\ is a shorthand 
for the value of the functional when evaluated over the 
time-reversal of T, viz. T = {q(r — t), —p(r — t)}. 

By choosing F[T'] = 5[T - V] in Eq. Q and using 
the property W[T] = —W[T], one obtains an identity be- 
tween the distribution of trajectories in the two directions 

BLEU, 



Pf(T) 



_ a(W-AF) 



(2) 



where Pf(T) and pr(T) are the probabilities of observing 
a particular trajectory T in the forward and reverse pro- 
cesses, respectively. This result offers a means of achiev- 
ing the aforementioned goal — trajectories from the re- 
verse process can indeed be included in the forward path- 
ensemble when their density is reweighted by e@( w ~ AF K 
Our next goal is to optimally combine direct estimates of 
/?p(r) from forward processes with indirect estimates ob- 
tained from pr(T) via Eq. this will be done with the 
weighted histogram analysis method (WHAM) [l7| • 
The objective of WHAM is to find an optimal (i.e. 
least variance) estimator for a desired probability distri- 
bution from a series of independent estimates of biased 
distributions, where "biased" here means that the distri- 
bution of interest is related to the remaining ones by a 
simple reweighting factor. To be specific, given a series 
of normalized distributions p\{x) of a random variable x, 
with i — 1, . . . , M , and M unbiasing relations of the form 



p(x) = fi(x)p\{x), 



(3) 



where p(x) is the distribution of interest and fi(x) is the 
unbiasing factor for the i-th distribution, the WHAM 
strategy seeks a linear combination of M independent es- 
timates of p{x) obtained from the measured biased distri- 
butions Pi(x) via Eq. ([3]), such that its variance cr 2 [p(a;)] 
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is minimized. This results in 1(| 17 1 



p{x) 



(4) 



where rij is the number of samples in the estimate of the 
i-th distribution. (For notational simplicity, here we do 
not distinguish the exact distribution and its sample esti- 
mate). Applied to the problem of estimating Pf(T) from 



J 



n F forward and hr reverse trajectories, Eqs. ©-(HJ give 
an optimized estimator for the forward probability distri- 
bution of trajectories in terms of the measured forward 
and reverse densities: 



Pf(T) = 



n F p F (T) + n R p R (T) 



-P(W-AF) 



(5) 



We are now ready to derive the main results of our paper. Taking the average of .F[r] using the optimized density 
from Eq. ([5]), we obtain the following estimator for the forward path-ensemble average of T: 



tif T 



n F + n R e-P( w - AF ) 



n R T 



n F + n R e 



p(W+AF) 



(6) 



where in the last average we have again used the property that the total work is odd under time-reversal. (An analogous 
expression for the reverse path-ensemble can be obtained by switching the definitions of forward and reverse.) This 
general result forms the basis of our bidirectional method, and different applications can be obtained with suitable 
choices of T. 

Our first example is concerned with free energy differences, where we choose T = e~P w S, with = W^[T] defined 
as the partial work between times a and b along the trajectory T, i.e. W% = f (dH/dt) dt. (Note that, according to 
this notation, the total work W coincides with Wq). Invoking Jarzynski's equality e _,3 ( Ft_FA ) = (e~ l3W °)p for the 
l.h.s. of Eq. ([6]), this choice of T gives 



\n F + n R e-^ w -^ F ) 




nRe f3(W+AF) 



where AFt = F t — Fa is the free energy difference between the equilibrium states defined by the Hamiltonians 

H(q,p;t) and H(q,p;0), and in the last average we have used the property Wg[r] = — W,T_t[r]. For the particular 

cases where t = or t = r, this result can be rearranged to yield the Bennett Acceptance Ratio (BAR) formula 

for AF [l8j], as generalized to nonequilibrium processes by Crooks [l[ (for a multistate extension, see [191). Thc 

above equation further generalizes BAR to estimate intermediate free energy differences AF t . Operationally, when 

estimating an intermediate free energy difference, we must first estimate AF to use in the r.h.s. of Eq. 0. This can 

be accomplished with BAR, which has been shown to be a maximum likelihood estimator of AF 201 1. 

! 



Free energy differences can also be estimated using a 
cumulant expansion of Jarzynski's equality [6|. In or- 
der to analyze bidirectional data with this approach, one 
should apply Eq. ([6]) to estimate moments of the work dis- 
tribution, choosing T — W n . This is more rigorous than 
a method which applies the Crooks fluctuation theorem 
between states which are not in equilibrium [21| . A bidi- 
rectional estimator for the energetic contribution to AFt 
can be obtained by choosing T — H(q,p;t)e~ l3 ^ w "^ AFt ' > 
in Eq. ([6|). This results in the average energy at time t, 
as was shown in the unidirectional case [22l |. 

In the context of single- molecule pulling experiments, 
the system is typically driven out of equilibrium by a 
time-dependent potential Vt — V(z t ;t) acting on a col- 
lective coordinate Zt = z(q(t)) (e.g. the end-to-end dis- 
tance of a protein) such that the total Hamiltonian is 
H = Ho + Vt, where Hq is the (time- independent) Hamil- 



tonian in the absence of the external perturbation. In this 
case, the free energy difference AFt involves the equilib- 
rium states of the system corresponding to the potential 
at Vt and V$. However, one is often more interested in 
the potential of mean force Gq(z) of the unperturbed 
Hamiltonian, i.e. in the effective potential dictating the 
equilibrium distribution of z-values in the absence of the 
external potential. Although in the limit of sufficiently 
stiff potentials the free energy difference approaches the 
PMF this approximation fails for soft springs (23| 
such as those used in optical tweezer experiments Q, in 
which case one should use more rigorous methods. One 
approach starts from the observation that the equilib- 
rium distribution of z-values in the absence of thc ex- 
ternal potential (i.e. the unbiased distribution) is given 
by p Q (z) = C^e^^iSiz - z t )e- 0w o) F @, %, where 
C = (e~P( w °~ Vt >) p is an overall normalization constant, 
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which can be shown to be independent of t. With this 
result in mind, a bidirectional estimator for po(z) can be 
obtained from Eq. © by choosing T — 6(z — z t )e _/3W V 
Moreover, since this expression for po(z) is correct for all 
times t, different estimates of po(z) can be obtained from 
different time-slices during the pulling process, and these 
can in turn be combined according to the WHAM pre- 
scription (Eqs. ([3]) and ([4])). Indeed, rewriting the above 
result for po(z) in the form of Eq. |3]), viz. 



where the factor e ~P AF t — {e~^ w °)p is introduced to 
normalize the distribution in square brackets, one arrives 
at the Hummer-Szabo estimator for Po(z) 0,(1], 



-/3G (z) _ 



J2 t (5(z-z t )e-^ w o) F e^ 

J2 t e -0[V(z;t)-AF t ] 



(9) 



where Go(z) = —(3 1 lnpo(z) is defined up to an additive 
constant. The above PMF formalism has been extended 



Po(z) = C~ 



(S(z - z t )e-P w °) 

P -0AF t 



(8) 



to account for multiple pulling protocols [24j and multiple 
dimensions 12511. 
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In order to optimally include trajectories from the reverse perturbation in Eq. ([9]), we choose T = S(z — z t )e 
in Eq. (|6|) and substitute the ensuing expression for (5(z — zt)e~^ w °) F in Eq. ((9J. This leads to our bidirectional 
PMF estimator: 



,-0G o {z) _ 



n F 5(z-z t )e-e w 



n K !)(z-z T -t)e T - 



/3AF t 



e -f3[V(z;t)-AF t ] 



(10) 



where AF t is estimated via Eq. ((TJ) and AF = AF r via BAR. (As in WHAM, AF t can also 
be estimated self-consistently by iterative cycles of Eq. (fTU|) and numerically integrating AFt = 
j e -f3[G (z)+V(z;t)] dz / f e -0[Go(z')+v(z';O)] dz , j If we switch the definitions of forward and reverse, then G Q (z) dif- 
fers by the constant AF. 

I 



To demonstrate these results, we perform Browman 
dynamics simulations on a one-dimensional potential 
whose unperturbed Hamiltonian is Hq(z) = (5z 3 — 10z + 
3)z (as used by Hummer 26]). The time-dependent 
Hamiltonian is H(z; t) = H {z) + V(z; t), with V(z; t) — 
k s (z — z(t)) 2 /2 and k s chosen as 15. In the forward di- 
rection, the center of the potential z(t) is linearly varied 
from -1.5 to 1.5 over 750 steps; it is varied from 1.5 to 
-1.5 in the reverse direction. Before pulling, trajectories 
are equilibrated for 100 steps. Dynamics are run with a 
diffusion coefficient D = 1, temperature parameter (3—1, 
and time step At = 0.001. Work is calculated with the 
discrete formula W h a = + At )^ + Ai ) - 

H(z(t + At);t)]. 



For AFt estimates on this model system, our bidirec- 
tional strategy outperforms existent methods (Fig. []}. 
Unidirectional estimates of AF t based on Jarzynski's 
equality are markedly biased as the states are further 
perturbed from the starting equilibrium state. Chelli and 
coworkers have also developed an asymptotically correct 
bidirectional estimator that reduces to BAR at the end 
states (Eq. (16) in [27[). However, their derivation is 



limited to deterministic systems, and although we have 
empirical evidence that their estimator approaches the 
correct AFt for Brownian simulations in the limit of a 
large number of trajectories (data not shown), it leads 
to a more pronounced bias than Eq. ([7]) for the simula- 
tions under the above conditions (Fig. [T]). We suspect 




FIG. 1: (Color online) Comparison of Aft estimators: 
Jarzynski's equality applied to 500 forward (rightward tri- 
angles) or reverse pullings (leftward triangles, time reversed 
so that AFt = AF at t = 0.75); our optimized estimator, 
Eq. J7J) (upward triangles) and Eq. (16) of Ref. H3] (dotted 
line) applied to 250 pullings in each direction. The exact 
AF, calculated by applying Gauss- Kronrod quadrature in 
MATLAB 7.5 to numerically integrate / e~ l3H(z;t) dz between 
z(t) — 5 < z < z(t) + 5, is shown as a shaded line. 



that this slower convergence is due to the use of an un- 
optimized Jarzynski estimator (see Eq. (7) in 27]), but 
since a general derivation of their results is not yet avail- 
able, it is presently difficult to verify this as the source of 
discrepancy between these two estimators, and we leave 
this question for future investigations. 

Our PMF reconstruction methods also compare favor- 
ably with unidirectional methods (Fig. As with AFt, 
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reconstructed PMFs from separate forward and reverse 
processes increasingly overestimate the PMF farther from 
region sampled by the original state. In contrast, our 
bidirectional formula, Eq. (jlOp , optimally combines the 
data to reduce this bias. As the method of Chclli and 
coworkers for PMF reconstruction requires a stiff-spring 
assumption, it is not applicable here. 
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FIG. 2: (Color online) Comparison of PMF estimators: (a) 
Hummer and Szabo's method, Eq. ©, applied to 500 forward 
(rightward triangles) or reverse (leftward triangles) pullings. 
(b) Our estimator, Eq. (|10p . applied to 250 forward and 250 
reverse pullings (upward triangles). The shaded line is the 
exact PMF. PMFs from forward and bidirectional data are 
shifted to align with the exact PMF at z — —1.25; for the 
reverse, they are aligned at z — 1.25. In (b), the harmonic 
potential used in our pullings is shown as a dashed line. 



In summary, building on the observation that the con- 
vergence of Jarzynski's nonequilibrium work average is 
dominated by time-reversed counterparts of trajectories 
generated via the reverse process 14} , we have introduced 
a formula that optimally includes such trajectories in 
generic nonequilibrium path-averages (Eq. ©). As an 
application of this result, we have derived a bidirectional 
estimator for free energy differences in terms of nonequi- 
librium measurements of work (Eq. |(7J)). Although it re- 
duces to BAR for the special case of endpoint free energy 
differences AF, our formula also allows for the estimation 
of intermediate values AF t of the free energy during the 
switching process. When applied to the problem of esti- 
mating potentials of mean force Gq(z) in nonequilibrium 
force spectroscopy, our methods yield a bidirectional esti- 
mator for Gq(z) that optimally combines time-slices from 
forward and reverse measurements of position and work 
(Eq. (|10p ). Numerical comparison of our formula with 
unidirectional estimates based on the Jarzynski equality 
[2] or the Hummer-Szabo method 0, @] reveal that our 
reconstructed free energy differences are of better overall 
quality than these unidirectional estimators, which are 
increasingly biased as one drives the system farther away 
from its original equilibrium state. It has been noted 
that faster pullings farther from equilibrium contain less 
instrument noise and therefore lead to more accurate free 
energy estimates [28|. It is thus expected that our bidi- 
rectional estimators will further improve the quality of 



such experimental estimates by appreciably reducing the 
finite-sample bias most evident in fast pullings. 
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